Doctoral Thesis Techniques for Improving Voice Conversion Based on Eigenvoices
نویسنده
چکیده
Voice conversion (VC) is a technique for converting a source speaker’s voice into another speaker’s voice without changing linguistic information. As a typical approach to VC, a statistical method based on Gaussian mixture model (GMM) is used widely. A GMM is trained as a conversion model using a parallel data set composed of many utterance-pairs of source and target speakers. Although this framework works reasonably well, the converted speech quality is still insufficient and a training process of the conversion model is less flexible. Eigenvoice conversion (EVC) is an effective method for making the training process more flexible. An eigenvoice GMM (EV-GMM) is trained in advance with multiple parallel data sets consisting of the single pre-defined speaker and many pre-stored speakers. Then, a conversion model for a new speaker is flexibly built by adapting the EV-GMM using a few arbitrary utterances of the new speaker. Two main frameworks have been proposed based on EVC: 1) one-to-many EVC, which allows the conversion from a single source speaker’s voice into an arbitrary target speaker’s voice; and 2) many-to-one EVC, which allows the conversion in reverse. Although these frameworks achieve much higher flexibility than the traditional VC, there are still remaining limitations in building the conversion model between an arbitrary speaker-pair. In addition, the conversion performance of the EVC is significantly degraded because the EV-GMM captures acoustic variations among the pre-stored target speakers. To make VC applications more practical, it is indispensable to improve the conversion performance and develop a more flexible training framework. ∗Doctoral Thesis, Department of Information Processing, Graduate School of Information Science, Nara Institute of Science and Technology, NAIST-IS-DD0761009, February 4, 2010.
منابع مشابه
Cross-language voice conversion based on eigenvoices
This paper presents a novel cross-language voice conversion (VC) method based on eigenvoice conversion (EVC). Crosslanguage VC is a technique for converting voice quality between two speakers uttering different languages each other. In general, parallel data consisting of utterance pairs of those two speakers are not available. To deal with this problem, we apply EVC to cross-language VC. First...
متن کاملResearch as transition instrument: A phenomenological investigation of future image in Ph.D. thesis writing
This research has been done to investigate Shiraz university doctoral students’ perspectives on thesis writing. Required data has been gathered by using deep interviews with eight doctoral students. Based on an abductive research strategy and using interpretative phenomenology, the research findings show a Ph.D. thesis doesn’t have a place in the big picture of their life. Themes abstracted fro...
متن کاملUsing Context-based Statistical Models to Promote the Quality of Voice Conversion Systems
This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...
متن کاملA Comparative Study of the Defense of Nursing PhD Thesis in Iran and Top United States Universities
Background : The most important event in the doctoral course is the completion and defense of the dissertation, which leads to learning and improving the necessary skills to conduct research and improve performance in the field. Evaluating a doctoral dissertation defense program helps to identify the strengths and weaknesses of this process. Therefore, this comparative study has investigated th...
متن کاملImproving EigenVoices-based techniques and SMLLR for Speaker Adaptation by combining EV and SMLLR techniques or using Genetic Algorithms
This paper constitutes a study of several classical and original methods for a speaker adaptation of the acoustic hidden Markov models of an automatic speech recognition system (ASRS). Most of today’s real applications require that the speaker adaptation process continuously improves the performance of the underlying ASRS, as more utterances are pronounced by a new speaker. The first part of th...
متن کامل